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Abstract 

Background: Mortality estimates can measure and monitor the impacts of conflict on a population, guide 
humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration 
and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using 
retrospective population-based surveys. 

Results: We present a two-stage cluster sampling method for application in population-based mortality surveys. 
The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters 
in the first sampling stage and Google Earth ™ imagery and sampling grids to select households in the second 
sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors 
affecting feasibility and methodological quality are described. 

Conclusion: Sampling is a challenge in retrospective population-based mortality studies and alternatives that 
improve on the conventional approaches are needed. The sampling strategy presented here was designed to 
generate a representative sample of the Iraqi population while reducing the potential for bias and considering the 
context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should 
be considered and tested in other conflict settings. 

Keywords: Cluster sampling, Population-based survey, Mortality, Conflict, Iraq war, Geographic information system 
(GIS), Google Earth™ 



Background 

Monitoring civilian mortality in conflicts can help target 
humanitarian assistance and minimize loss of life among 
those caught up in conflict. While in stable situations 
surveillance of mortality using vital registration systems 
is the gold standard, these systems are rarely functional 
during conflicts and are often nonexistent where major 
conflicts occur [1-3]. Alternative approaches, such as 
passive surveillance through news media or press reports 
have been shown to under-record deaths and may be 
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distorted by political agendas [4]. Consequently, epide- 
miologists are often limited to estimating mortality using 
retrospective population-based surveys at the household 
level [5]. In these, a representative sample of consenting 
households is selected to assess mortality events over a 
given time period, and mortality rates, along with their 
upper and lower confidence intervals, are calculated [5]. 
These calculations can also be compared to other time 
points to estimate excess mortality related to a conflict. 
However, simple random and systematic random sam- 
pling methods are difficult during conflict given data un- 
availability and logistical and security constraints [6,7] . 

One alternative approach is to apply cluster sampling 
to estimate conflict-related mortality rates. Two-stage 
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cluster sampling was standardized in 1978 by the World 
Health Organization's Expanded Programme on 
Immunization (WHO EPI) to assess vaccine coverage 
and has since been extended to estimate conflict-related 
mortality in Iraq [8,9], Kosovo [10], the Democratic 
Republic of Congo [11], and Sudan [12]. This approach 
is relatively fast, can be done with limited financial and 
human resources, and exposure to unsafe areas can be 
limited. Additionally, a complete sampling frame is not 
required. These are all very important considerations in 
conflict settings. 

In conventional two-stage cluster sampling, the first 
sampling stage involves the selection of a predetermined 
number of clusters. Clusters are mutually exclusive subpo- 
pulations, most frequently constructed from recognized 
administrative boundaries [13]. Clusters are selected from 
a list of primary sampling units (e.g. census areas, town- 
ship boundaries) with the probability of selection propor- 
tional to population size (or estimated population size) 
[14]. In the second stage, starting households are selected 
from each cluster. As complete and adequate listings of 
households rarely exist, households are not selected from 
a sampling frame. Rather, they are selected by the survey 
team in the field based on a random procedure [15]. Most 
commonly, starting households are selected in the field 
based on the "random walk" method, which involves 
identifying the center of the cluster, or another easily 
distinguishable feature such as a main street, and selecting 
a random direction to walk, thus drawing a transect across 
the cluster. In practice, the random direction is 
often selected by "spinning a pen" [16]. Among those 
households that lie along the transect, one household is 
randomly selected as the starting household and a prede- 
termined number of next nearest households are surveyed. 
Ultimately, the data collected from each cluster are pooled 
to make inferences with respect to the target population 
and standard errors are adjusted for design effects of using 
a cluster sampling approach [6]. 

Despite the benefits of this WHO EPI-type cluster 
sampling, the validity of this approach has been ques- 
tioned [17]. Most criticisms are related to the potential 
for bias in the second stage when using the "random 
walk" approach, which has been shown to introduce bias 
if the household selection procedure is not in fact 
random [7,14,16,18,19]. In addition, this approach is 
subject to interviewer bias, whether conscious or 
unconscious, and can take a significant amount of time 
to implement in the field. Too much field time exposes 
survey team members to risk in conflict settings. It is 
also impossible to calculate the probability of selection 
at the household level, so the sample is not a true 
probability sample [14], 

To date, a few variations on this conventional cluster 
sampling approach have been developed for application 



in nation-wide health studies. Relevant examples that 
have been used and show promise for certain settings 
include compact segment sampling [14,20] and random 
spatial sampling using global positioning system (GPS) 
coordinates [16]. However, these approaches may not be 
appropriate in conflict settings. Compact segment sam- 
pling requires a significant amount of field time 
exposure and two visits to each cluster [15] while the 
use of GPS units is often a security risk in the context of 
modern warfare [9]. Variations on the conventional two- 
stage cluster sampling designed for nation-wide mortal- 
ity estimates in conflict settings are needed to generate 
accurate and useful mortality estimates and to contrib- 
ute to theoretical and practical advances in the field of 
conflict epidemiology [21]. 

This paper presents a two-stage cluster sampling 
method implemented in a retrospective mortality study 
in Iraq. Our goal was to develop a cluster-based sam- 
pling method while taking into consideration the specific 
challenges of conflict settings. Our cluster sampling uses 
a gridded population dataset and a spatial sampling 
algorithm in a geographic information system (GIS) to 
select clusters in the first stage. Starting households are 
selected in the second stage using imagery and a 
sampling grid in Google Earth ™ [22]. 

Methods 

We received University of Washington Institutional 
Review Board approval for the study, and also received 
approval from the Ministries of Health in Baghdad and 
in Kurdistan. Methods were reviewed to ensure they 
complied with the ethical guidelines for epidemiological 
research set out by the Council for International Organi- 
zations of Medical Sciences. An ethicist experienced in 
international research associated with the Institute of 
Translational Health Sciences at the University of Wash- 
ington further reviewed the protocols to ensure the 
safety of subjects and interviewers was adequately 
protected. Additionally, Simon Fraser University's 
Research Ethics Board approved the use of secondary 
data for this project. 

Data and Tools 

The sampling method uses a gridded population dataset 
in the first stage of sampling. As a preliminary step, we 
reviewed a number of spatial population datasets. To 
date, three high resolution global gridded population 
datasets have been generated and used in epidemio- 
logical studies: the Gridded Population of the World 
(GPWv3) [23], Global Rural-Urban Mapping Project 
(GRUMPvl) [23], and LandScan™ [24]. These datasets 
use different interpolation methods to generate gridded 
population counts (see Table 1). These differences are 
important in selecting the most appropriate spatial 
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population dataset Each dataset is publically available 
(for research and public health purposes), accessible on- 
line, and can be easily integrated into most GIS 
platforms. If available, an alternative is to use country- 
specific datasets, which exist for many countries in 
Africa (see Afripop Project, 2011 and United Nations 
Environment Programme Gridded Population Data- 
bases) [25,26]. We selected the LandScan™ dataset for 
reasons described below. 

In our approach, the first cluster sampling stage uses the 
'Create Spatially Balanced Points' (CSBP) function in the 
ArcGIS (vlO) software platform. This tool uses a spatial 
sampling algorithm based on the work of Theobold et al. 
(2007) [31] and Stevens & Olsen (2004) [32]. It uses a 
probability surface depicting relative probabilities of 
inclusion and a Reversed Randomized Quadrant-Recursive 
Raster (RRQRR) algorithm to randomly generate a set of 
spatially balanced points [31,32]. The probabilities of 
inclusion can be based on any relevant attribute, but the 
use of population size enables the application of a 
probability proportional to estimated size (PPES) approach. 
When using population data to generate the probabilities of 
inclusion, a sample that mimics the distribution of the 
target population is generated. This allows researchers to 
analyze the final dataset without weighting or other 
constructs to create a full population estimate. 

Administrative boundaries are required to implement the 
sampling strategy presented here. Current and spatially 
referenced administrative boundaries (at the country, 
provincial, and district scale) can be downloaded from the 
Global Administrative Areas website in shapefile format for 
nearly all nations in the world [33]. 

Imagery in the Google Earth ™ platform is used in 
the second stage of the sampling method. Google Earth 



maps the earth by the superimposition of images 
obtained from satellite imagery and aerial photography 
(images from airplanes, kites and balloons) [34]. This is 
a particularly useful tool for public health and conflict 
epidemiology as it has no financial cost, is easy to use, 
and can interact with other mapping technologies. 
Although the resolution, quality, and age of the imagery 
varies across the globe, it is generally possible to identify 
individual household rooftops. 

Gridded Population Dataset Selection 

We selected the 2008 LandScan™ gridded population 
dataset to depict the population of Iraq. The 2008 
LandScan™ population data were obtained from the 
Oak Ridge National Laboratory [24]. This dataset was 
selected over others for these reasons: 

• Theory: We preferred the "smart interpolation" 
approach over the areal weighting approach for the 
disaggregation of sub-national population counts to 
grid units. Smart interpolation uses numerous 
sources of ancillary data (i.e. land cover, road 
network, slope, etc.) and does not assume that 
populations are uniformly distributed across space 
within administrative units [35]. 

• Timeliness: The 2008 LandScan™ dataset offered 
the most up to date spatial population dataset 
available. Both GRUMP and GPW were released in 
2000. Since a census has not been conducted in Iraq 
for decades, all population datasets are based on 
out-dated census information. However, the ancillary 
data used to disaggregate population data, land 
cover data for example, is most recent for the 
LandScan™ dataset. 



Table 1 Overview of gridded population datasets currently available 



Dataset Provider Spatial Input population Interpolation Ancillary data Year(s) 

(website) resolution data source method 



GPWv3.0 


CIESIN 

(http://sedac.desi n. 
columbia.edu/gpw/) 


2.5' 

(~5 km 2 ) 


UNPD census data 


Areal 

weighting 1 


-None 


1990,1995, 2000, 
2005 (projection), 
2010 (projection), 
2015 (projection) 


GRUMPvl 


CIESIN 

(http://sedac.ciesin. 
columbia.edu/gpw/) 


.5' 

(~1 km 2 ) 


UNPD census data 


Dasymetric 
mapping 2 


-Night-time light imagery 
-Populated places 


2000 


LandScan™ 


ORNL 

(http://www.ornl.gov/sci/ 
landscan/) 


.5' 

(~1 km 2 ) 


Population Division 
of the U.S. Census 
Bureau 


Smart 

interpolation 3 


-Land cover 
-Road networks 
-Digital elevation models 
-Slope 

-Satellite imagery 


2008 



Adapted from [27]. 

1 Areal weighting overlays a grid onto sub national administrative unit population data and distributes the population across space according to the proportion of 
the administrative unit area that is contained within the grid cell [28]. 

2 Dasymetric mapping disaggregates sub national population estimates into grid units using ancillary data such as road networks [28,29]. 

3 Smart interpolation disaggregates sub national population estimates to grid cells according to likelihood co-efficients of population occurrence derived from 
ancillary data such as proximity to roads, slope, land cover [30]. 
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• Validation in study area: Mubarekas 2008 study 
conducted in Northern Iraq found the LandScan™ 
dataset correlated with settlements and population 
distribution on the ground [36]. 

• Validation in other conflict settings and limited 
resource areas: A review entitled "Tools and 
Methods for Estimating Populations at Risk from 
Natural Disasters and Complex Humanitarian 
Crises" recommended that the LandScan™ 
population estimates tend to be better than other 
population sources in countries where the census 
data are spatially coarse and not recent, which is the 
case in Iraq [37]. 

Stage 1- Cluster Selection 

In the first stage of sampling, we used the 2008 
LandScan™ gridded population dataset and the CSBP 
tool to randomly select a sample of clusters weighted by 
estimated population size. The 2008 LandScan™ 
gridded population dataset was downloaded in ESRI Grid 
format at the global scale, masked to the spatial extent of 
the Iraq administrative boundary and converted to a 
density grid. Using this raster (see Figure 1), we 
identified those grid cells with 25 people per km 2 or 
fewer. These grid cells were assigned a probability of zero 



as they were unlikely to contain the minimum 20 house- 
holds required for the survey design. For all other grid 
cells, we standardized the population dataset to create a 
probability surface. The probability surface is a raster 
layer with values ranging from 0 to 1, indicating the 
probability of inclusion for each grid cell. Higher values 
indicate a higher probability of inclusion. Here the prob- 
ability of inclusion is based on population estimates 
according to LandScan™ 2008 data. 

We then used the CSBP tool to randomly generate 
125 points according to the probability surface and the 
RRQRR algorithm. The grid cells containing one of 
the 125 points were then selected as possible clusters 
and numbered 1-125. Next, the 125 selected clusters 
were exported as a KML (Keyhole Markup Language) 
file for use in the Google Earth ™ platform. Using the 
Google Earth ™ satellite imagery, we visually exam- 
ined all 125 clusters to identify any that were clearly 
not residential areas. Among these original 125 clus- 
ters, 12 were obviously either industrial areas, com- 
mercial areas, or otherwise not residential and were 
thus excluded from the initial sample set. Using the 
remaining 113 suitable clusters, we numbered the first 
100 and held the remaining 13 clusters as a set of 
"backup" clusters that could replace any cluster 
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deemed unsafe or otherwise inaccessible at the time of 
the survey. 

Stage 2- Starting Household Selection 

In the second stage of sampling, we needed to select a 
starting household in each of the first-stage clusters. The 
starting households were randomly selected using a 
sampling grid superimposed over satellite imagery in the 
Google Earth ™ platform (see Figure 2). Sampling grids 
were generated in ArcGIS at a resolution of 10 meters 
by 10 meters (10 m by 10 m) that corresponded to the 
spatial extent of each cluster. A resolution of 10 m by 
10 m was selected for the sampling grid to approximate 
the extent of a single rooftop. These sampling grids were 
exported as KML files and subsequently superimposed 
onto satellite imagery in Google Earth™. We assigned 
each 10 m by 10 m cell within the sampling grid a 
unique number, enabling the selection of a single grid 
cell using a random number generator. If that grid cell 
contained a household, ascertained through visual as- 
sessment of satellite imagery, that household was 
marked using the "Add Placemark" tool. If the grid cell 
did not contain a household, we moved to the next 
randomly selected grid cells until a cell containing a 
household was selected. In the event that more than a 
single rooftop existed within the selected grid cell, the 
household with a greater proportion of its rooftop con- 
tained within the cell was selected. The same steps were 
followed to select "back-up" starting households in each 
cluster in the event that a household no longer existed, 
was not a residential building, or was not accessible for 
security reasons. 



Results and Discussion 

Application: Iraq 201 1 University Collaborative Mortality 
Study 

This sampling method was designed for a retrospective 
population-based study aimed at estimating mortality in 
Iraq: the Iraq 2011 University Collaborative Mortality. 
This study was carried out in 2011 and was designed to 
update and extend earlier mortality estimates published 
in 2004 [9], 2006 [8], and 2008 [38]. The 2011 Iraq 
mortality study used a standard household demographic 
method and a sibling survival technique [15,39]. Results 
of the Iraq 2011 University Collaborative Mortality 
Study are expected in 2012. A power calculation 
informed the original sampling design, leading to the 
selection of 100 clusters with 20 households per cluster, 
for a plan to sample 2,000 households. 

Survey Preparation and Implementation 

GPS devices could not be used to locate starting 
households as laws prohibiting the use of this technology 
by Iraqi civilians made this an unacceptable security risk. 
Consequently, printed maps were used. For each cluster, 
we created maps at different scales. To help survey 
teams locate households efforts were made to produce 
maps that included easily identifiable structures such as 
water towers, highways, or mosques. As part of five day 
initial field training, two days were spent in map reading 
and orienteering. Teams quickly became adept at 
locating clusters and households using maps, which was 
verified in a second training session. 

The 2011 Iraq mortality survey was carried out 
between May and July 2011 by four teams of Iraqi medical 





A B C 

Figure 2 Illustrative example of Starting Household selection and map used to locate Starting Household. A. Illustrative cluster; B. 
Illustrative cluster with sampling grid; C. Illustrative starting household. The cluster and selected household shown here are for illustrative 
purposes only to protect the identity of households that participated in the survey. The image of the sampling grid shows only a small 
proportion of the entire grid. The imagery is from Google Earth™ 
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doctors with training in family medicine and community 
medicine and previous household survey experience. There 
were two supervisors each managing and monitoring the 
work and progress of two of the survey teams. We were 
fortunate that one of our project supervisors was 
particularly enthusiastic and adept in the use of maps. We 
also employed a person in each governorate who had local 
knowledge of the landscape. Only one (0.01%) of the 100 
clusters selected for the original sample could not be visited 
because of safety concerns; this cluster was replaced with 
one of the remaining 13 "back-up" clusters from the same 
governorate. 

Feasibility and Methodological quality 

In designing a cluster survey trade-offs and contextual 
factors must be considered to balance feasibility and 
methodological quality [40]. As there are few alternatives 
to cluster samples in conflict situations our goal was to 
improve methodological quality in a practical and 
feasible field approach. We believe that this adapted 
two-stage cluster sampling method meets these goals. 

Feasibility 

The feasibility of doing research in conflict-settings is 
always balanced by the requirement for rigorous proto- 
cols [41]. Factors influencing feasibility include survey 
team safety, time required for research, resources and 
expertise needed, and financial costs. 

Safety of survey team: Safety of the survey team is not 
merely related to feasibility but is also an important 
ethical consideration [41]. In our study, the use of GPS 
to identify starting households was viewed as a 
significant safety threat; this is likely to be the case in 
other conflict settings. The maps allowed teams to 
move directly to the starting household without the 
extended exposure needed for the "random walk" 
method. Using a local facilitator to help locate starting 
households can reduce risk to the survey team. 
Time: The element of time must be considered in 
terms of both preparation time (generating a sample 
selection prior to field work) and the amount of time in 
the field. Selection of the clusters, which included 
gathering and preparing the data used, took 
approximately 1 week for a single researcher. The 
process of selecting households and preparing the maps 
required approximately 2 weeks for a single researcher. 
Time needed in the field tended to vary depending on 
the setting. If the starting household was located in an 
urban setting or near an identifiable landmark, the 
household was identified quickly and with ease. This 
task was more time consuming and difficult for remote 
settings. The use of a local person with knowledge of 
the territory facilitated the task of locating a starting 
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household. The ability to examine cluster and 
household locations before going into the field using 
Google Earth™ proved helpful to field teams, and in 
areas where access was restricted by checkpoints and 
barriers permitted scouts to locate the site a day in 
advance and guide the survey teams the following day. 
Resources and expertise: All of the datasets used are 
free and publically available for research purposes. The 
same is true for the Google Earth™. ArcGIS however, 
requires a purchased licence. While expertise in GIS 
software is needed to carry out the steps in the two 
first stages of sampling, comparably less training is 
required to orientate survey teams in map reading and 
orienteering. We recommend employing at least one 
ground-level supervisor with a knack for geography and 
maps. 

Financial cost: The financial costs related to the 
implementation of this sampling method are not large. 
Although the costs associated with the GIS software 
can be substantial, most institutions have some access 
to GIS software. Additionally, free or low-cost 
alternatives to ArcGIS could be considered (e.g. 
GRASS) [42,43]. Free and high-quality imagery from 
Google Earth™ produces substantial savings over 
commercial imagery providers. Additionally, using 
maps rather than purchasing GPS units is a cost 
savings approach. Even in situations where GPS units 
are not a security concern the use of printed maps is a 
low-cost alternative for localizing sites [44]. 

Methodological quality 

Tapp et al. [2008] present quality indicators for retro- 
spective mortality surveys in complex emergencies [45]. 
Two of the five indicators, coverage and bias, are specif- 
ically related to sampling design and are discussed below 
in relation to the method presented here. 

Coverage: Is the sample sufficiently representative of 
the underlying population of interest [45]? To answer 
this question, we examined the coverage of our sample 
in terms of regional distribution across administrative 
units and urban-rural status in comparison to the Iraqi 
population in general (see Table 2). Regional 
distribution and urban-rural status are important 
factors, as violence and therefore mortality risk are 
presumably influenced by both. We examined the 
percentage of total clusters and households in each 
governorate (an administrative unit akin to a province 
or state) compared to the percentage of the Iraqi 
population in each governorate. These comparisons 
suggest that our sample sufficiently captures the 
regional distribution of the underlying Iraqi population. 
The extent of urban-rural coverage of our sample was 
assessed by identifying those clusters located in urban 
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regions and those located in rural regions and 
comparing the urban-rural proportion of our clusters 
to the national urban-rural proportion. We classified 
clusters as urban or rural based on local knowledge of 
survey team supervisors as there were no adequate data 
available identifying urban- rural status at the local 
level. Urban regions are defined here as metropolitan 
cities, towns, or peri-urban areas while rural areas are 
defined as remote communities with low population 
density. Based on best available estimates, the urban- 
rural divide in Iraq is approximately 66 percent urban 
to 34 percent rural (UN population data [46], US 
census Bureau [47], and Iraq's Central Organization for 
Statistics and Information [48] all provide estimates 
between 66 percent and 67 percent) According to our 
classification of clusters as urban or rural, 31 percent of 
the sample is rural while 69 percent is urban. Our 
sample has slightly over-sampled the urban population, 
which is to be expected as we excluded the very remote 
areas in our sampling strategy to ensure that cluster 
locations had at least 20 households. Nonetheless, the 
urban-rural coverage in our study is very close to the 
estimated national average. 

Bias: As Tapp et al. (2008) highlight [45] an important 
quality consideration for retrospective surveys is 
whether the population was sampled to avoid bias. 



The sampling method presented here was designed 
to reduce the potential for sampling bias by randomly- 
selecting both clusters and households prior to field 
work. We also took into account criticisms regarding 
the sampling strategy used in the 2006 Iraq mortality 
study, especially what some authors have referred to as 
a "main street bias." Using sampling grids enables sim- 
ple random selection of households to minimize the 
possibility of systematic selection of certain regions 
and further removes the possibility of conscious or 
unconscious interviewer bias by selecting households 
a priori. 

Nevertheless, the sampling method described here 
does have several potential sources of bias, mostly stem- 
ming from the population size from which clusters were 
defined [17]. The LandScan 2008™ population data for 
Iraq is not based on official 2008 census data, but rather 
uses estimated population figures derived from the most 
recent national census. A complete national census has 
not been conducted in Iraq since 1987 (although a 
partial census excluding Kurdish regions was carried out 
in 1997). A full Iraqi census has been planned since the 
early 2000 s but has been repeatedly postponed [48]. 
Both internal and external displacement have been 
significant in Iraq since the US -led invasion in 2003; it is 
unlikely that any extrapolated population estimate has 



Table 2 Sample selection clusters across Iraqi governorates for 2011 Iraq mortality study, compared to Iraq population 
estimates 



Governorates 


Estimated 
population 1 


Percentage of 
Iraqi population 


No. of 
clusters 


Percentage 
of sample 


Difference 
(pop% - sample%) 


No. of 
individuals 2 


Percentage 
of sample 


Difference 
(pop% - sample%) 


Al-Anbar 


1,451,583 


4.52 


7 


7 


-2.48 


990 


9.28 


-4.76 


Al-Muthanna 


719,824 


2.24 


1 


1 


1.24 


142 


1.33 


0.91 


Al-Najaf 


1,180,681 


3.68 


2 


2 


1.68 


200 


1.88 


1.80 


Al-Qadisiya 


1,121,782 


3.49 


4 


4 


-0.51 


580 


5.44 


-1.95 


Babil 


1,727,032 


5.38 


3 


3 


2.38 


353 


3.31 


2.07 


Baghdad 


7,180,889 


22.37 


23 


23 


-0.63 


2,347 


22.01 


0.36 


Basrah 


2,555,542 


7.96 


8 


8 


-0.04 


882 


8.27 


-0.31 


Diala 


1,370,537 


4.27 


5 


5 


-0.73 


463 


4.34 


-0.07 


Duhouk 


968,901 


3.02 


2 


2 


1.02 


284 


2.66 


0.36 


Erbil 


1,471,053 


4.58 


9 


9 


-4.42 


803 


7.53 


-2.95 


Kerbala 


1,003,516 


3.13 


2 


2 


1.13 


219 


2.05 


1.08 


Kirkuk 


1,290,072 


4.02 


2 


2 


2.02 


201 


1.89 


2.13 


Missan 


1,009,565 


3.14 


3 


3 


0.14 


308 


2.89 


0.25 


Ninevah 


3,237,918 


10.09 


13 


13 


-2.91 


1,298 


12.17 


-2.08 


Salahuddin 


1,259,298 


3.92 


3 


3 


0.92 


312 


2.93 


0.99 


Sulaimaniya 


1,551,974 


4.83 


7 


7 


-2.17 


663 


6.22 


-1.39 


Thi Qar 


1,846,788 


5.75 


3 


3 


2.75 


308 


2.89 


2.86 


Wasit 


1,158,033 


3.61 


3 


3 


0.61 


310 


2.91 


0.70 



Estimated 2009 population According to Central organization for statistics and Information Technology [COSIT] estimates. 
2 Total number of individuals based on reported household members. 
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accurately captured the complex patterns of population 
movement [49]. It is estimated that there are up to 2.5 
million Iraqis who have fled Iraq to neighbouring coun- 
tries and at least that number who have internally 
migrated principally due to violence [50]. It is difficult to 
estimate how the use of out-dated population figures 
and migration affects sampling and the potential for 
bias. It is likely that there are important implications, 
perhaps with mortality in the most violent regions 
underestimated, and violence in areas receiving intern- 
ally displaced being overstated. 

The construction method of the LandScan™ dataset 
may also introduce bias. The smart interpolation 
approach employed by LandScan™ uses ancillary data 
such as road networks and satellite imagery to define 
occupation probabilities for all grid cells in a raster grid 
[51]. Assumptions made in generating these grid cell 
occupation probabilities, that people are most likely to 
live along roads for example, may not reflect true 
population distribution on the ground in some areas. In 
the field we did not encounter problems due to this the- 
oretical limitation. 

Lastly, the use of Google Earth™ imagery may also 
introduce bias as images can be out-dated, thereby ex- 
cluding recent development. This is especially problem- 
atic if the newly developed communities have differential 
mortality experiences. In our study, imagery dates range 
from 2002 to 2011, with the majority (70%) from late 
2004. It is possible that households not captured in older 
imagery could be families fleeing more violent areas and 
therefore families with higher probability of mortality 
events. Google Earth™ satellite imagery may thereby se- 
lect households that underestimate mortality. It is also 
important to note that Google Earth™ satellite imagery 
does not cover every region of the globe with high reso- 
lution images such that individual rooftops may not be 
identifiable. If this is the case, alternative sources of im- 
agery may be needed. 

We should note the measures of bias are both objective 
and subjective. Our own surveyors expressed concerns that 
this study sample appeared to minimize the enumeration of 
mortality events by the choice of too many clusters in re- 
mote areas and potentially protected populations (such as 
oil worker company enclaves) and not capturing mortality 
events among families that fled the country. 

Conclusion 

For a variety of political and economic failures, conflict 
affects many populations around the world. Given 
increasing resource constraints, economic instability, 
dwindling oil supplies, and food and water stress due to 
climate change, global conflicts are not likely to de- 
crease. There is a dearth of published work regarding 
both the population impacts of conflict and appropriate 



methods for studying the public health effects of conflict 
[41]. Although subject to certain limitations, retrospect- 
ive population-based mortality studies are an important 
tool in conflict epidemiology. Sampling is a challenge in 
such studies and alternatives that improve on the con- 
ventional cluster approach are needed. As Morris & 
Nguyen note in their review of cluster sampling used in 
humanitarian emergencies, we need to "look beyond the 
standard methods for measuring mortality" [6]. Adapting 
conventional cluster sampling and using novel data 
sources, tools, and technologies can improve the overall 
validity of retrospective survey estimates and support 
the feasibility of research in challenging conflict settings. 
The sampling strategy presented here was designed to 
generate a sample representative of the Iraqi population. 
We sought to reduce the potential for bias while consid- 
ering the context specific challenges of the study setting. 
When designing sampling methods for retrospective 
population-based mortality surveys, researchers must 
consider all available methods, options for improving 
and adapting these methods for a particular setting and 
endpoint, and the implications for feasibility and study 
validity. This sampling strategy, or variations on it, are 
adaptable and should be tested in other conflict settings. 
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